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Recent experimental results suggest that the native fold, or topology, plays a primary role in 
determining the structure of the transition state ensemble, at least for small fast folding proteins. 
To investigate the extent of the topological control of the folding process, we study the folding of 
simplified models of five small globular proteins constructed using a Go-like potential in order to 
retain the information about the native structures but drastically reduce the energetic frustration 
and energetic heterogeneity among residue-residue native interactions. By comparing the structure 
of the transition state ensemble experimentally determined by ^-values and of the intermediates 
with the ones obtained using our models, we show that these energetically unfrustrated models can 
reproduce the global experimentally known features of the transition state ensembles and "on-route" 
intermediates, at least for the analyzed proteins. This result clearly indicates that, as long as the 
protein sequence is sufficiently minimally frustrated, topology plays a central role in determining 
the folding mechanism. 
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I. INTRODUCTION 

Our understanding of the protein folding problem has 
hppn thoroughly changed hy. the n£M mem that has 



entropies, stabilities and energetic roughness as gauged 
by the glass transition temperatures. These models show 
a transition state ensemble about half way through the 



unfolded and folded states (Onuchic, Wolynes, Luthcy- 



emelged in the last decade This new view, ha.scd on the Schultcn fc Socci| |l995j). In this ideal case, all the COn 



energy landscape theory and funnel concept (Leopold 
Montal & Onuchic 1992 , Bryngclson, Onuchic, Socci fc 



Wolynes 1995 , gocci, Onuchic fc Wolynes 1996j, lOnuchic 



Nymcycr, Garcia & Onuchic 1998, Klimov & Thiru 



Luthey-Schultcn fc Wolyncsj |1997|, jDill fc Chan| |1997j 



malai F996|: [Mirny, Abkevich fc Shakhnovich||1996|, (Shea 



Nochomovitz, Guo & Brooks III 1998), describes fold- 
ing as the progressive evolution of an ensemble of par- 
tially folded structures through which the protein moves 
on its way to the native structure. The existence of a 
deep energy funnel in natural proteins and the relatively 
simple connectivity between most conformational states 
which are structurally close makes this description pos- 
sible even when only a few simple reaction coordinates 
that measure similarity to the native structure are used. 
The folding mechanism is controlled by both the shape of 
this free energy landscape and the roughness on it, which 
arises from the conflicts among interactions that stabilize 
the folded state and therefore can create non-native con- 
formational traps (|Bryiigclson fc Wolynes 1987|, Bryn- 
gelson fc Wolyncsj |l98q| [Goldstein T.nthcy-Schnltcn fc 



tacts in this transition ensemble would exist with the 
same probability. 

Although the average amount of native formation in 
the transition ensemble is about 50%, the lattice simu- 
lations show that, even when the sequence is designed 
to have substantially reduced energetic frustration, there 
are variations in the amount of nativeness of specific con- 



tacts in the transition state ensemble ( Dnuchic et al. 


1996, Onuchic et al. 


1993, 


Nymeyer et al. 


200q). Real pro- 



Wolmei 1992^ 



teins display similar heterogeneity in contact formation. 
In systems with no energetic frustration and equal na- 
tive interactions, these variations in the transition state 
ensemble are created solely by the folding motif and poly- 
meric constraints that make certain contacts more geo- 
metrically accessible and stable than others. This vari- 
ation in frequency that some contacts are made in the 
transition state ensemble generally reduces the entropy 
of the transition state and, when determined by the na- 
tive motif, is a gauge of the amount of "topological frus- 
tration" in the system. Although this type of frustra- 
tion can be m odified by some design tricks ( Plotkin fc 
Onuchic 1999), it cannot be completely eliminated: it 



The energetic roughness, however, is not the only 
limiting factor in determining a sequence's foldability. 
Even if the energetic roughness could be completely re- 
moved, the folding lan dscape would not be completely 
smooth. The oretical ( Wolynes 1996|, Nelson, Eyck fc 



Luthey-Schultcn fc Wolynes 1996|, Socci, Nymeyer fc 



Onuchic||l997j |Nclson fc Onuchic| |1998|, |Ouuchic. Socci 



man & Brooks III 



itan 



eno 



Onuchicl [19971, |Bcta ncourt fc Onuchic 1995 , jhciner- 
19984 tVlichclctti, Banavar, Mar- 



Sphpraga| |1QQ9|) and PYpprimpnta 



(Grintcharova,, Riddle. Santiago fc BakejJ 998_Ma,r 



reflects an intrinsic difficulty in folding to a particularly 
chosen shape. Minimalist models have shown how this 
heterogeneity leads to a transition ensemble that is a 
collection of diffuse nuclei w hich have various le vels of 
native contact participation ( |Onuchic et~aTl|l996| ). The 
minimalist models calibrated to real proteins show simi- 
lar overall levels of contact heterogeneity as in real pro- 
teins ( pnuchic et al. 1996| ). This picture of a transition 
state composed of several diffuse nuclei has be en con- 
firmed by other lat t ice and off-lattice studi es ( Klimov 
fc Thirumalal| |1998j, [Pande fc Rokhsarj ^999|) . In addi- 



tincz, Pisabarro fc Scrranof 1998] ) advances indicate that 
the final structure of the protein also plays a major role in 
determining a protein's foldability. Some particular fold- 
ing motifs may be intrinsically more designable than oth- 
ers. To address this difference in foldability which is not 
dpppndpnt on pnprgptip frustration wp havp introdnppd 



tion to selecting sequences which have low levels of ener- 
getic frustration, evolution appears to have selected for 
a particular set of folding motifs which have reduced lev- 
els of "topological frustration", discardin g other struc- 



tures to which it is too difficult to fold (Bctancourt fc 



Onuchic 1995, |Wolynes, Schulten fc Onuchic 1996, Nel 



son fc Oiiuchicl[1998| , Micheletti et al.||1999| , pcbe, Carlson 



the 



oncept of "topological frustration" (jNymcyer, So.cd 



fc Cnuchiq 200C ^ Onuchic. Nymcycr. Garcia. Chahinc 



Socci 1999, Shea, Onuchic fc Brooks HI 1999) 



Goddard||1999| ) 



Let us imagine an ideal situation for which the order 
of native contact formation during folding is not biased 
In t 



Guided by theoretical folding studies on lattice, 
off-lattice, and all-at o m simulations ( s ee for i nstance 
Onuchic et al. (1995), Onuchic et al. (1996). Boczko 



the 



scm 



-.Ip nparly all parrs of rhp prorpin havp a similar proK 



lis "idpa.1" situation, tlwP arP an Pnormonsly largP & BT . 00 } S J}\^ 995 \^ IQ^Ec et alj (|1999|), |Nymeyei 

her of equivalent folding pathways, a.nd an analysis of e^al] (E223)' fchea et alj (|1999|)) as well as recent ex- 



ra.nsition state ensemhle would show that for this en- perimental evidence (|Grantcharova et alj |1998|, pVIartinez 



et al. 1998, Chiti, Taddci, White, Bucciantini, Maghcrini 



abil 



ty of participation The structure in the transition Stcfani fc Dobson 1999], Martinez fc Serrano 199S, [Riddle 



ens dmhle has been estimated hv analogy with minimalist Grantcharova, Santiago, Aim, Ruczinski fc Bakerj pL999[ ) 



lattice models made to reproduce the global landscape 
features of small, fast folding proteins: similar Levinthal 



we suggest that real proteins, and especially small, fast 
folding (sub millisecond), two-state like proteins, have 
sequences with a sufficiently reduced level of energetic 
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frustration that the experimentally observed "structural 
polarization" of the transition state ensemble (viz. the 
variation in the amount of local native structure) is pri- 
marily determined by the topological constraints. That 
is, in well designed sequences, the variations are more de- 
termined by the type of native fold than by differences in 
sequence which leave the native fold relatively unchanged 
and the energetic frustration small. 

The amount of native local structure in the transi- 
tion state can be experimentally measured by using sin- 
gle and do uble point m utants as probes in the $ value 
technique (Fersht 1994). If the topology is a dominant 
source of heterogeneity in transition state structure, then 
the majority of evolved sequences which fold to the same 
motif would exhibit similar local structure in the transi- 
tion state ensemble. We provide evidence in this paper 
that not only is this the case, that much of the tran- 
sition state ensemble is determined by the final folded 
form, but, also for larger proteins that are not two- 
state folders, some "on-route" intermediates arc deter- 
mined by topological effects as well. Thus it appears that 
the dominance of topology in folding extends even into 
some larger, slower folding proteins with intermediates. 
This fact is consistent with some recent observations by 
Plaxco and collaborators that reveal a substantial corre- 
lation between the average sequence separation between 
contacting residues in the native stru cture and the fold- 
ing ra tes fo r single dom ain proteins ( Plaxco, Simons fc 
Bakp7| |l998| , |Chan||l998| ). 

To ascertain the extent of topological control of the 
folding behavior, we create several simplified energetic 
models of small, globular proteins using potentials cre- 
ated to minimize energetic frustration. We show that 
these energetically unfrustrated models reproduce nearly 
all the known global features of the transition states of 
the real proteins on whose native structures they are 
based, including the structure of folding intermediates. 
We directly compare the structure of the transition state 
ensemble experimentally determined by $ value measure- 
ments with the numerically determined one. The simu- 
lated transition state ensemble is inferred from structures 
sampled in equilibrium around the free energy barrier be- 
tween the folded and unfolded states. This free energy 
is computed as a function of a single reaction coordinate 
that measures the fraction of formed native contacts. The 
validit y of this method ha s been demonstrated in refer- 
ences ( pnuchic ct aL 199E, Nymeyer ct al. 2000). 

The organization of the paper is as follows: in sec- 
tion [n] we present in some detail the physical concepts 
underlying this work in the light of recent experimental 
results. In section [II we present results for a sample of 
five small, globular proteins, and compared these results 
against the available experimental data. The off-lattice 
model used in our study is presented in the Appendix. 
In order to investigate the relevance of the topology, we 
chose a model which reproduces the topological features 
of a given real protein and eliminates most of the ener- 
getic frustration and variations in the strength of native 



residue-residue contacts. The predicted transition state 
for these proteins are in good agreement with experimen- 
tal evidences, supporting our hypothesis of the major role 
played by topology. 



II. CHECKING THE FOLDING MECHANISM BY 
ANALYZING THE TRANSITION STATE 
ENSEMBLE 

How do we know what the folding transition state en- 
semble looks like? Experimental analysis of folding tran- 
sition state ensembles has been largely performed using 
the <I>-value analysis tech nique introduced by Fersht and 
co-workers (Fersht 1994). <& values measure the effects 
that a mutation at a given position along the chain has 
on the folding rate and stability: 



RT \n(k mut /k wt ) 
AAG° 



(1) 



where k mut and k wt are the mutant and wild-type fold- 
ing rate respectively, R is the ideal gas constant, T is 
the absolute temperature, and AAG° is the difference 
in the total stability between the mutant and wild-type 
proteins in kcal/mol. 

Because the folding event of small fast folding proteins 
is well described as a diffusive process over a barrier de- 
termined by the free energy profile, the folding rate can 
be written as a Kramer's-like equation ( |Socci et"aL 1996| ) 



k = fc cxp[-AGV#T] 



(2) 



where fco is a factor depending on the barrier shape 
and the configurational diffusion coefficient of the sys- 
tem. If ko is insensitive to small sequence changes, 



quenecs (Onuchic et al. 


199£ 


|3occi et al. 


.996. 


Nymeyer 


ct al. 2000|, Shea ct al. [1999, 


Onuchic ct a] 


. 1999,Bcalley 


& Bakci 1997, Munoz & Eaton 199S) the $ value is then 



seen to be a ratio of free energy changes of the folding 
barrier to stability: 



AAG* 
AAG° 



(3) 



where AAG* is given by 



AAG* = AGLt - AGL 



-iJTln=. (4) 



When this relationship is valid and the mutation can be 
considered a small perturbation, the $ value is a conve- 
nient measure of the fraction of native structure which 
is formed in the transition state ensemble around the 
site of the mutation. A $ value close to 1 means that 
the free energy change between the mutant and the wild 
type is almost the same in the transition state and native 
state, indicating that native contacts involving the mu- 
tated residue are already formed at the transition state. 
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Inversely, a <E> close to means that the free energy 
change is the same in the transition state and unfolded 
states, so the local environment of the residue is proba- 
bly unfolded-like. A detailed analysis of the mutation is 
needed to determine exactly what contacts are disrupted 
under mutation. Ideally, mutations are made which elim- 
inate small hydrophobic side-groups. Studies using <!> 
values with multiple same-site mutations generally sup- 
port th£_a£cjmi£y_of_$_vahj£a 



What experimental evidence exists as to the role of 
topology in determining the average structure in the fold- 
ing transition state ensemble? The clearest evidence to 
date of the role of topology comes from comparisons of 
the transition state structure of two homologues of the 
SH3 domain (src SH3 and a-spectrin SH3). These two 
homologues have only weak identity (« 30% identity with 
gaps), but $ values at corresponding sequence positions 
are highly correlated (Grantcharova et"aT| 1998, Martinez 



of t he transition ensemble (Matouschck, Otzcn, Itzhaki, ct al. 1998 ), supporting the degeneracy in the folding be- 



Jackson & Fcrsht 1995| ), although sizable changes in the 
transition state structure have been induced in at least 



one protein through a single point mutatio n (Burton 
Huang, Daugherty, Caldcronc & Oas 1997). In inter- 



preting $ values, it is also important to remember that 
they only measure the relative change in structure, not 
the absolute amount of structure. This leads to the pos- 
sibility that some mutants with low values may have 
nearly native local environments in the transition state, a 
possibility seen clearly in the experimental studies of Pro 



carboxypeptidase A2 (Villegas, Martinez, Avilcs 
ijl [19981 ). 



Ser- 



havior for these two sequences. Furthermore, one of these 
sequences has a strained $— \& conformation in the high 
$ region of the distal turn. The fact that this strain does 
not de tectably lower the $ values in the local neighbor- 
hood ( Martinez et al, 1998| ), suggests that the sequence 
details and local stability are less important for deter- 
mining how structured a region is in the transition state 
ensemble than its location in the final folded conforma- 
tion. Other evidence indicates that these results may be 
more generally applicable than simply for SH 3 or /3-sheet 
proteins. Sequence conservation has been shown not to 



The validity of values as structural measurements 
clearly supports the Kramer's-like description of the fold- 
ing rate and the fact that the <I> can be properly under- 
stood as a ratio of the free energy change of the tran- 
sition ensemble over the change of the native ensemble 
(equation ^). This latter equation is very convenient as a 
star ting point for computing <I> values. In several recent 



correlate with $ values (Kim, Gu & Bakei 1998), indicat 



sim ulation papers for lattice and off-lattice protein mod 



ing that in general sequence changes at a given position 
in a protein weakly affect the $ value at that position. 

Results for some small fast folding proteins (such 
as CI2 and the X-repressor) suggest that the transi- 
tion state is an expanded version of the native state, 
with a certain degree of additional inhomogen e ity over 
the s t ructu re (Itzhaki, Otzcn & Fcrsht 1995, Burton 



els, we have investigated this issue at length (Nymcyci 



et a [.| |2000|, |Onuchic et al] |l999|, |Shea et al.| |1999|)^AlT 

these studies concluded that as long as the systems 
present a weak or moderate level of energetic frustration 



ct al. 1997 ) (similar to t he theoretical predi c ations for 
small a-he l ical p roteins (Onuchic et al. 1995] , Boczko & 



Brooks III [1995)), while results for other proteins (as 
the /3-sheet SH3 domain) show apparentl y larger struc- 
tural heterogeneit y in the transition state (ph cinerman & 



mined from changes in the free energy barrier, determined 
using a single simple reaction coordinate, yield quan- 
titatively correct $ values. Therefore, all the calcula- 
tions performed in this work were done utilizing eq. ^ 
— no actual kinetics was performed but only the appro- 
priate sampling of the protein configurational space (see 
Ap pendix and refs. Socci fc Onuchic ( 1995|), Boczko fe 



(su ofr as the Go-like models in this work), $ values deter- Brooks III 1998a, phcinerman fc Brooks III 1998b). This 



Brocks II3| (|1995|), pnuchic ct al.| (|1996j),|Nymcyer et al 



difference in the degree of "structural polarization" that 
is emerging between small a-helix and /3-sheet proteins 
suggests that the folding mechanism of a given protein 
is fundamentally tied to the type of secondary structural 
elements and their native arrangement. Current stud- 
i es using <I> value technique have been made o f src SH3 
(|Grantcharova et al. 1998), a-spectr in SH3 ( Martinez 



ct al.||1998[), CI2 flltzhaki et al.||l995[ ), Barnase QFersht 



the folding barriers are of a few fc B T or more and the & Fcrsht 1999]), A 



0|), for example, for details). Technically, as long as Matouschck fc Serrano 1992) , Barstar (Killick, Freund 



( Burton ct al.| pf997|), Chef 



-repressor 



disp lacement of the barrier position along this reaction ([Lopez-Hernandez fc Serrano 



cooi dinate under mutation is sufficiently small, the $ Gladwin, Goldberg fc Bakeij 



val ues can be computed using free energy perturbation: A2 (Villegas ct al. 1998 ), RNase H ( Raschkc, Kho fc 



$ = 



AAG TS - AAO u lii(^ F >< RT ) TS -hi(e* F >< RT ) l 



1996 ), protein L ( Kim, Yi 
1998| ), Pr ocarboxypeptidase 



AAG F - AAG U ln(e AE / RT ) F - \n(e AE / RT )u 

(5) 

We use equation ^| to compute $ values for our protein 
models using fixed transition, unfolded, and folded re- 
gions identified by the free energy profile viewed using a 
single order parameter: Q, the fraction of native contacts 
formed in a given conformation. 



Marqusee 1999) and th e tetrameric protein domain from 
tumor suppressor p53 ( Matcu, Del Pino fc Fcrsht 1 mtlj . 



In this paper, we analyze five proteins (SH3, CI2, Bar- 
nase, RNase H and CheY) that have been extensively 
studied experimentally and for which, therefore, details 
of their transition state ensemble are quite well known. 
We generate sequences (and potentials) for simulating 
these different globular proteins. These sequences have 
the native backbone folds of real experimentally stud- 
ied globular proteins but sequence and potential intcr- 
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actions designed to drastically reduce the energetic frus- 
tration and heterogeneity in residue-residue interactions. 
By comparing the transition state structures of these un- 
frustrated models with the experimental studies of their 
real protein cousins, we quantify the effects of the native 
topology. If topology completely determines how fold- 
ing occurs, then the model and real proteins should have 
identical folding behavior and $ values. If energetic frus- 
tration and heterogeneity are critical for determining the 
folding mechanism, then the variations in $ values with 
position should bear far reduced similarity to those in 
the real proteins on which the computer homologues are 
based. 

Two of the five studied proteins are simple two-state 
like fast folding proteins (SH3 and CI2), while the other 
three (Bamase, RNase H and CheY) are known to fold 
through the formation of an intermediate state. We show 
not only that our simple models can reproduce most of 
the $ value structure, but also that models for Bamase, 
RNase iland CheY correctly reproduce the folding inter- 
mediates of these proteins, suggesting that many of the 
"on-route" intermediates are also largely determined by 
the type of native fold. 

We represent the five globular pr oteins using a sim- 
pli fied C q, model with a Go-like (Ueda, Taketomi & 
Go .975 ) Hamiltonian as detailed in the Appendix. This 
potential is in its details unlike that of real proteins, 
which have residue-residue interactions with many com- 
ponents (Coulomb interactions, hydrogen bonding, sol- 
vent mediated interactions, etc., etc.). The crucial fea- 
tures of this potential are its low level of energetic frus- 
tration, that characterizes good folders and a native con- 
formation equal to the real protein. The ability of this 
model to reproduce features of the real transition state 
ensemble and real folding intermediates is a strong in- 
dication that the retention of the topology is enough 
to determine the global features of their folding mech- 
anism. Using these models, we simulate the dynamics of 
a protein starting from its native structure, for several 
temperatures. To monitor the thermodynamics of the 
system, we group the configurations obtained during a 
simulation as a function of the reaction coordinate, Q, 
defined as the fraction of the native contacts formed in 
a conformation (Q = at the fully unfolded state and 
Q = 1 at the folded state). The choice of Q as order 
parameter for the folding is motivated by the fact that in 
a funnel-like energy landscape, a well designed sequence 
has the energy of its conformations reasonably correlated 
to degree of nativeness, and the parameter Q is a good 
measure of the degree of similarity with the native struc- 
ture. Our Go-like potential is minimally frustrated for 
the chosen native structure, and the prediction of transi- 
tion state ensemble structures and folding rates for these 
Go-like systems has been shown to be quite accurat e 
flSocci et al!||1996| , |Shea et al.||1999| , |Nymeyer et~aI|^000D . 
From the free energy profile as a function of Q, it is easy 
to locate the unfolded, folded and transition state ensem- 
bles, as it is shown in next section. Since these models 



consider totally unfrustrated sequences, they may not re- 
produce the precise energetics of the real proteins, such 
as the value of the barrier heights and the stability of the 
intermediates, nonetheless they are able to determine the 
general structure of these ensembles. 

In order to compare the folding process simulated us- 
ing our model to the actual process for a given protein 
(as obtained from experimental values analysis), we 
need to choose a "mutation" protocol to compute <& val- 
ues. Experimentally, the ideal mutation is typically one 
that removes a small hydrophobic side-group such as a 
methyl group that makes well-defined and identifiable 
residue-residue contacts in the native state. The <& value 
is then sensitive to this known contact. Our computa- 
tional mutation is the removal of a single native bond, 
so our computer <!> values are sensitive to the fractional 
formation of this bond Qij between residues i and j. We 
make these mutations because, as in most real mutations, 
they are sensitive to the formation of specific contacts, 
rather than being averages over interactions with many 
parts of the native structure. They mostly resemble the 
interac tion (frj n t value ma de by making double cycle mu- 
tants (Fersht et al. 1992| ). <E> values are computed from 
equation ^| In an ideal, perfectly smooth funnel-like en- 
ergy landscape, all the 4> values should be equal; in an 
energetically unfrustrated situation, <E> values variations 
arc due to the structure of the native conformation. 



III. DETERMINING THE TRANSITION STATE 
ENSEMBLE OF SMALL GLOBULAR PROTEINS 

We have discussed the idea of "topological frustration" 
and its role in determining the structural heterogeneity of 
the transition state ensemble. We explore its role directly 
by creating protein models which drastically reduce the 
energetic frustration and energetic heterogeneity among 
residue-residue native interactions leaving the topology 
as the primary source of the residual frustration. Results 
obtained with these models, constructed using a C Q level 
of resolution with a Go-like potential designed to fold 
to the native trace of chosen proteins, are then compared 
against the experimental data of those proteins. Five pro- 
teins with different folding motifs and different amounts 
of transition state heterogeneity (variation in $ values) 
and/or intermediates have been investigated. 

We first analyze Chymotrypsin Inhibitor II (CI2), a 
mixed a— /3 protein with a broad distribution of $ values 
(nearly uniform from to I). Then we present an anal- 
ysis for the sre SH3 domain, a largely /3-sheet protein 
with a more polarized transition state structure (a sub- 
stantial number of large $ values). We then apply the 
same technique to Bamase, Ribonuclease H (RNase H) 
and Che Y, three other mixed a-p proteins which fold via 
a folding intermediate. Although these proteins are not 
two-state folding proteins, we demonstrate that topol- 
ogy is also the dominant determinant of their folding be- 
havior. We show that the topology plays a major role 
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not only in the transition state ensemble, but it is also 
largely responsible for the existence and general struc- 
ture of the folding intermediate. This result may be quite 
common for "on-route" folding intermediates and could 
provide a computational method for distinguishing be- 
tween "on-pathway" and "off-pathway" structures which 
are inferred from experiments. To check the applicabil- 
ity of this method, the same approach presented in this 
pap er has been extended elsewhere ( plcmenti, Jcnning; 



Cnuchic 1999) to a pair of larger proteins (Dihydrofo- 
late Reductase and Interleukin—10). Even for these very 
large proteins we found that the overall structure of the 
transition state and intermediate ensembles experimen- 
tally observed can be obtained utilizing similar simplified 
models. 



A. Analysis of two— state folders: CI2 and SH3 

1. CI2 

The Chymotrypsin Inhibitor 2 (CI2) protein is a 64 
residue protein, consisting of six /^-sheets packed against 



studies ( 


Jackson & Fersht 


|l991b|, 


Jackson & Fcrsht 


1991a, 


Jackson, Moracci, elMasry, Johnson & Fcrsht 


1993 


, Jack- 


son, elMasry & Fershtj 1993|) have established that CI2 



folding and unfolding can be modeled by simple two- 
state kinetics. The structure of the transition state for 
this protein has been extensively characterized by protein 



engineering (Itzhaki et al. 1995, Otzcn & Fcrsht 1995 
Jackson & Fersht "|l991b), by free energy functional ap- 



proaches ( Shoemaker fe Wolynes 199£ , shoemaker, Wang 



(Micheletti et al. 


1999|), and by all-atom molecular dy- 


namics simulations (Li & Daggett 


L996, 


Kazmirski, Li & 


Daggett 


1999, 


Lazaridis & Karplus 


199 r 


). These studies 



have shown the transition state has roughly half of the 
native interactions formed in the transition state ensem- 
ble and a broad distribution of $ values in agreement 
with the general predictions of the energy landscape the- 
ory used with a law of corr e sponding states for small 



proteins (Onuchic et al. 1995, Onuchic et al. 1996). The 



broad distribution of $ values suggests that most hy- 
drophobic contacts are represented at a level of about 
50% in the transition state ensemble. 

We constructed a Go-like C a model of CI2 as described 
in the Appendix. Several fixed temperature simulations 
were made and combined using the WHAM algorithm 
( ]Swendsen||l9"9l| ) to generate a specific heat versus tem- 



perature profile and a plot of the potential of mean force 
as a function of the folding order parameter Q (see figure 
|l|). From the free energy profile, we identified the domi- 
nant barrier, and used the thermal ensemble of states at 
its location to generate <& values from equation |[ The 
ranges of values of Q used to determine each of these 
ensembles are shaded in figure [j]. The mutations have 



been implemented by the removal of single attractive in- 
teractions (they are replaced with the same short ranged 
repulsive interactions used between residues without na- 
tive interactions). The values computed via this method 
are shown in figure ^. Also shown in this figure is the 
fractional formation of individual native contacts in the 
transition state. The small difference between these two 
figures is primarily due to the fact that in the $ calcula- 
tions the native contact formation at the folded and un- 
folded states are also taken into account. Because of the 
higher concentration of contacts between residues near- 
by in sequence and the local conformational preferences, 
the unfolded state shows a high level of local structure. 
The inaccurate representation of local contacts in the un- 
folded state makes the short range <!> values less reliable 
as transition structure estimates than long range $ val- 
ues. 

From the calculations, we detect three significant re- 
gions of large $ values: the a-helix, the mini-core de- 
fined by strands 3 and 4 and their connecting loop, and 
between the C-terminus of strand 4 and the N-terminus 
of strand 5. These regions generally have $ values in 
excess of 0.6. Slightly smaller values of about 0.5 ex- 
ist for the short range contacts between the N-terminal 
of strand 3 and the C-terminal of the a-helix and for 
contacts between strand 3 and strand 4. All other re- 
gions lack a consistent set of large <£> values. Despite the 
large number of native contacts between strands 1 and 
2 and the oj-helix and between strands 5 and 6 and the 
a-helix, only low $ values are observed in this region 
(nearly all below 0.2 in value). A comparison between 
these dat a and the exhaustive a nalysis of Fersht and col- 
leagues ( ptzen fe Fcrsht 1995] ) shows excellent overall 
agreement. They have found that "/^-strands 1, 5 and 6 
... are not structured in the transition state....". Strand 
2 also shows a highly reduced amount of structure. Fur- 
thermore, "the central residues of /3-strands 3 and 4 in- 
teract with the a-helix to form the major hydrophobic 
core of CI2." The hydrophobic mini-core in this region 
(defined as the cluster formed by side-chains of residues 
32, 38, and 50) is detected by single mutant and double 
mutant ^-values (Itzhaki et al. 1995| ) to be at least 30% 
formed in transition ensemble. Similarly, they found the 
a-helix, particularly the N-capping region, to be highly 
ordered. 

In summary, we see a quite good overall agreement 
except for a discrepancy in the short range interactions 
in the loop region between strands 4 and 5. This pro- 
tein shows generally higher $ values between interactions 
which arc more local in sequence and lower $ values be- 
tween interactions which are distant in sequence. The 
results are thus consistent with the picture of the tran- 
sition state as a collection of non- specific and somewhat 
diffuse nuclei ( Onuchic et al. 1995 ) . This overall low level 
of frustration suggests a low level of "topological frustra- 
tion" in this model as well and a particularly designable 
motif. 
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2. src SH3 domain 

Src SH3 is the 57 residue fragment of Tyro sine-Protein 
Kinase that stretches from T84 to SI 40. It has five /3 
strands (and a short 3-10 helix) in an anti-parallel ar- 
rangement, forming a partial /3 sandwich. Experimental 
measurements have shown that the SH3 domain folds us- 
ing a rap id, apparently two-state mechanism. A $ value 
analysis ( Grantcharova et al. 1998 ) reveals that the distal 
loop hairpin and diverging turn regions are both highly 
structured and docked together at the transition state; 
the hydrophobic interactions between the base of the 
hairpin and the strand following the diverging turn are 
partially formed, while other regions of src SH3 appear 
only weakly ordered in the transition state ensemble. The 
overall representation of the transition state structure 
of src SH3 — having the distal loop and diverging turn 
largely formed and other regions weakly formed — agree s 
with studies of a-spectrin SH3, (Martinez et al. 1998) 
which has a similar backbone structure but a dissimilar 
sequence (approx 30% identity with gaps). This observed 
similarity along with evidence of a strained backbone 
conformation in the distal loop of the a-spectrin SH3 
( Martinez et al. 1998 ) supp orts the concept of "topo log- 
ical" dominance in folding (Grantcharova et al. 199S). 

Fig. U shows the folding behavior as obtained from 
our dynamics simulations of the Go-like analogous of the 
src SH3. The free energy barrier defining the transition 
state location is evident in the figure. As before, we have 
computed <J> values from equation || by mutating (remov- 
ing) every native residue-residue attractive contact. The 
results of this calculation are shown in figure [|. In ad- 
dition to 4> values, the contact formation probability at 
the transition state ensemble have been calculated. Our 
previous caveats concerning $ values for local interac- 
tions still apply. We observe the highest collection of 
off-diagonal (long range) $ values is in the diverging 
turn — distal loop interaction exactly as seen from the 
experimental $ value measurements. We see very low 
values in the RT loop region, in accord with the two mu- 
tants in this loop. We also see medium to high values 
between the two /3 strands which are connected by the 
distal loop. The transition state structure of the SH3 
presents a substantially larger degree of structural po- 
larization than CI2, where the <E> values are much more 
uniform. This suggests that SH3 has a backbone confor- 
mation which is intrinsically more difficult to fold, i.e., 
there is a greater level of "topological frustration" in this 
structure. Nevertheless the transition state composition 
is well reproduced for both the two proteins. 



B. Analysis of three proteins which fold throughout 
the formation of an intermediate state: Barnase, 
RNase H and CheY 

Barnase, RNase H and CheY are three small a-(3 pro- 



teins (although larger than the previous two proteins): 
Barnase is a 110 residue protein, composed by three a— 
helices (located in the first 45 residues) followed by five f3- 
strands; RNase H consists of 155 residues which arrange 
themselves in five a-helices and five /3-strands; CheY is 
a 129 residues, classic a/3 parallel fold in which five f3- 
strands are surrounded by five a-helices. Experimental 
results show that these three proteins do not fold by fol- 
lowing a simple two-state kinetics directly from the un- 
folded state to the native structure, but fold through the 
formation of a metastable intermediate which intercon- 
verts into the native state. This brings up an interesting 
question: is topology alone able to determine the pres- 
ence of an intermediate in the folding process? In Figs. 
||, |] and H we show evidence for the first time that such 
intermediates can be created solely from a Go-like min- 
imalist model which preserves the native topology. The 
presence of this intermediate during these protein's fold- 
ing events is a requirement of the native protein motifs. 
The free energy changes upon mutations of a wild-type 
three-state protein are experimentally measured both for 
the intermediate and the transition state, to define two 
different sets of $-values for the protein: 



$T = 



TS 



AAGj-AAGu 
AAGjr-AAGii 
AAGrs-AAGu 
AAG F -AAGu 



(6) 



where $j provides information about the structural com- 
position of the intermediate state (I), and $>ts of the 
transition state (TS). Following we discuss in some de- 
tails the results for the three proteins. Since, as for the 
first two proteins, the $-values and the native contact 
probabilities provide somewhat similar information, for 
simplicity, we show only the results obtained for the na- 
tive contact probabilities (for safety we have checked the 
<!>-values and determined that similar information is re- 
covered) . 



1. Barnase 

The analysis o f experimentally obtained $ values 
( Fcrsht et al. 1992| ) for the Barnase shows that some rel- 
evant regions of the structure are fully unfolded in the 
intermediate while other regions are fully folded. 

Fig. |TJ shows the intermediate and the transition state 
structure obtained from the Go-like model. The interme- 
diate shows substantial structural heterogeneity: there 
are very high probability values for interactions within 
the /3-sheet region and its included loops, and very low 
values for interactions within the a-helices and their 
loops and between the a-helical and /3-sheet regions. 
Some local short range helical interactions are formed. 
The transition state ensemble structure shows the same 
structure as the intermediate with the addition of strong 
interactions within helices 2 and 3; between helix 2, he- 
lix 3, the first /3-strand, and the intervening loops; and 
between the second /3-strand and the second helix. 
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Comparing these simulation results with exten sive mu- 
tagenesis studies of reference ( Fersht ct aL 1992| ) , we ob- 
serve a good qualitative agreement. The /3-sheet region 
is highly structured in the intermediate as it is the core 
region 3 (consisting of the packing of loop 3, that joins 
strands 1 and 2, and of loop 5, that joins strands 4 and 
5, with the other side of the /3-sheet). In agreement with 
experiments, the earliest formed part of the protein ap- 
pears to be the /3-sheet region. Also the core region 2 
(formed by the hydrophobic residues from helix 2, he- 
lix 3, the first strand, and the first two loops) is found 
to be only weakly formed in the intermediate and the 
transition state. 

There are two minor discrepancies between the Bar- 
nase model and the experimental data. First, we slightly 
overestimate the formation of core region 2 in the tran- 
sition state ensemble. Second, we underestimate the 
amount of structure in core region 1 (formed by the pack- 
ing of the first helix against a side of the /3-sheet) in both 
intermediate and transition ensemble. In particular, we 
under-represent the interaction between helix 1 and the 
/3-sheet region. The experimentally observed early pack- 
ing of helix 1 against the rest of the structure is not 
reproduced by our model. Clearly there are some impor- 
tant energetic factors which have been neglected by the 
simple model. These may be inferred from the Barnase 
crystal structure. For example, one can see that helix 1 
is largely solvent exposed, with interactions between it 
and the remainder of the protein formed by only five of 
the eleven helix residues. 83 % of the interactions reside 
on the hydrophobic residues PHE7, ALA11, LEU14 and 
GLN15, and the 17 % of the interactions are formed by 
the charged residues ASP8 and ASP12, while the solvent 
exposed part of the helix is composed of polar residues. 
Large stabilizing interactions other than tertiary (most 
hydrophobic) interactions are neglected in the model, be- 
ing probably responsible for the failure in predicting the 
formation of the structural parts involving helix 1. In this 
structural detail, it appears that the topological factors 
are not the leading determinant of the folding behavior. 



2. Ribonuclease H 

Kinetic studies of the wild-type RNase H have shown 
that an intermediate state is populated in the folding 
process, and the structure of this intermediate has been 



cence and hydrogen exchange methods (|Dabora & Mar- 


qusee 


1994, 


Yamasaki, Ogasahara, Yutani, Oobatake & 


Kanayal 19951 |Dabora, Pelton & Marqusee 1996, Cham- 


berlain, Handel & Marqusee 1996], 


Raschke & Marqusee 


19971) and by protein engineering ( 


Raschke et al. 


1999). 



mediate involves the a-helix 1, the strand 4, the a-helix 
4 and the a-helix 2. Hydrogen exchange experiments 
have shown that the a-helix 1 is the region of the pro- 
tein most protected from exchange, suggesting that most 
of the interactions involving the a-helix 1 are already 
significantly formed at the intermediate state of the fold- 
ing process. The helix 4 and the /3-strand 4 are the next 
most protected regions, while the a-helix 5 has low to 
moderate level of protection. After the completion of the 
this intermediate structure, the rate-limiting transition 
state involves the ordering of the /3-sheet and the a-helix 
5. The packing of helix 5 across the sheet is found to be 
the latest folding event. 

The results of the model for RNase H show a good 
agreement with the experimental evidences. As shown in 
Fig. |8[ we find that the formation of contacts involving 
the helix 1 is the earliest event in the folding process. 
Contacts arising from the a-helix 4 and the /3-strand 4 
are then formed at the intermediate state and consoli- 
dated at the transition state. In agreement with the ex- 
perimental results, we find that, at the transition state, 
interactions between the a-helix 1, the strand 4 and the 
rest of the protein are mostly formed; the a-helix 4 is 
also well structured and interactions between the helix 4 
and the other parts of the protein are partly formed. In- 
teractions among the strands are almost all formed, but 
the sheet is not yet docked to the helix 5. 



3. CheY 



Utilizing protein engineering ( Lopez-Hernandez & Ser- 



1996, Lopez-Hernandez, Cronet, Serrano & Mufioz 



evidences, we find an intermediate state in the folding 
process of the RNase H model. Experimental results in- 
dicate that the most stable region of the protein inter- 



1997 ), the transition state of CheY has been character- 
ized and it can be described as a combination of two 
subdomains: the first half of the protein (subdomain 1), 
comprising the a-helices 1 and 2 and the /3-strands 1-3, 
is substantially folded whereas the second half (subdo- 
main 2) is completely disorganized. The helix 1 seems 
to play the role of a nucleation site around which sub- 
domain 1 begins to form. Moreover, an intermediate has 
been detected at the early stage of the folding process 
where all the five a-helices are rather structured. The 
last two helices, however, are very unstructured in the 
later occurring transition state. From this result it has 
been suggested that a misfolded species is visited at the 
beginning of the folding process. 

Our simple model detects two possible intermediates 
for this protein, one of them is an "on-route" intermedi- 
ate that is short-living and occurs just before the tran- 
sition state ensemble (Q around 0.6 in Fig. |9|). Surpris- 
ingly, the unfrustrated model is also able to detected a 
"misfolded" trap in the folding of Che Y. Since non-native 
interactions are not allowed in the model, this trap is a 
long-living partially folded state created by the topolog- 
ical constrains. There is no direct connection between 
this trap state and the fully folded state. The structure 



8 



of this trap is shown in Fig. [T(j and it agrees with the 
experimental observation of all helices well structured. 
Differently from the previously discussed proteins, the 
model of Che Y seems to have a tendency to first form 
a "wrong" part of the protein and, when this happens, 
a partial unfolding must occur before the folding can be 
completed. 

Finally, analyzing the transition state structure, we 
find a good agreement with the experimental data. As 
shown in Fig. |l0|, the first part of the protein (subdomain 
1) is almost fully folded at the transition state ensemble, 
while subdomain 2 is completely unfolded. 



IV. CONCLUSIONS 

Recent theoretical studies and experimental results 
suggest that the folding mechanism for small fast fold- 
ing proteins is strongly determined by the native state 
topology. The amount of energetic frustration, arising 
from the residual conflict among the amino-acid interac- 
tions, appears largely reduced for these proteins so that 
topological constraints are important factors in governing 
the folding process. Towards exploring this topological 
influence in real proteins, we analyzed the folding process 
of the Go-like analogous of five real proteins. Since we 
have used Go-like potentials, the energetic frustration is 
effectively removed from the system, while the native fold 
topology is taken into account. It is important to high- 
light that the results from such studies exhibit the overall 
topological features of the folding mechanism, although 
we do not expect the precise energetic values for barrier 
heights and intermediate state stabilities. For example, 
real proteins are not necessarily totally unfrustrated and 
they have only to minimize energetic frustration to a suf- 
ficiently reduced level in order to be good folders. Also, 
as long as energetic frustration is small enough, creating 
some heterogeneity at the nati ve interactions may help t o 
reduce topological frustration ( Plotkin fc Onuchic 1999), 



and that will energetically favor some contacts over oth- 
ers. 

The effective use of a small number of global order pa- 
rameters as reaction coordinates, in interpreting real data 
or studying more detailed protein folding model, depends 
critically on the degree of frustration present in real pro- 
teins (Nymeyer et al. 2000). Since our results show that 



general structural features of the transition state ensem- 
ble in real proteins, at least for this class of fast folding 
proteins, is reproducible by using a substantially unfrus- 
trated potential, several different global order parameters 
should work to explain the folding mechanism. For this 
reason, it should not be a surprise the fact that, utilizing 
energy landscape ideas and the funnel concept, some very 
simple models with approximate order parameters deter- 



features of the transition state ensemble. 

Again, we have compared in details the structure of 
the transition state ensemble of the five proteins result- 
ing from our simulations with experimental data. The 
agreement between our results and the experimental data 
supports the idea that energetic frustration is indeed suf- 
ficiently reduced and the protein folding mechanism, at 
least for small globular proteins, is strongly dependent 
on topological effects. The structure of the transition 
state ensemble of the CI2 presents a broad distribu- 
tion of <I> values — i.e. a reduced degree of structural 
polarization — in agreement with predictions based on 
the energy landscape theory (see Onuchic et al. (1995), 
Onuchic et al. (1996)). On the other hand, the structure 
of the SH3 transition state ensemble shows a higher de- 
gree of polarization. Nevertheless, by using our simplified 
Go-like model, we have reproduced the transition state 
composition for both proteins, demonstrating that topol- 
ogy is largely responsible for the observed experimental 
differences. The last three proteins we have analyzed, 
(Barnase, RNase Hand CheY) are known to fold through 
a three-state kinetics, involving the formation of an inter- 
mediate structure. Our Go-like model of these proteins 
also fold with a three-state kinetics with intermediates 
that arc analogous to the ones detected experimentally. 
This fact suggests that topology is also a dominant factor 
in determining the "on-route" intermediates. 
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tcin 1999) have been successful in predicting qualitative 
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APPENDIX: MODEL AND METHOD 

In order to investigate how the native state topology 
affects the folding of a given protein we follow the dy- 
,ics of the pro tein by using a Go-like Hamiltonian 



native map as any three and four subsequent residues are 
already interacting in the angle and dihedral terms. A 
contact between two residues is considered formed 
if the distance between the C a 's is shorter than 7 times 



nam 



their nativ e distance cry. It has been shown (Onuchic 



(Uetfla et al .1975) to describe the energy of the protein 



in a given configuration. A Go-like Hamiltonian takes 
into account only native interactions, and each of these 
interactions enters in the energy balance with the same 
weight. It means that the system gains energy as much 
as any amino acid pair involved in a native contact is 
close to its native configuration, no matter how strong 



et al.||l999|) that the results are not strongly dependent 
on the choice made for the cut-off distance 7. In this 
work we used 7 = 1.2. We have used Molecular Dy- 
namics (entailing the numerical integration of Newton's 
laws of motion) for simulating the kinetics of the protein 
models. We employed the simulation package AMBER 
(Version 4.1) (Pearlman, Case, Caldwell, Ross, Cheatam 



the actual interaction is in the real protein. Residues in Ferguson, Singh, Weiner fc Kollman| pL995| ) at constant 



a given protein are represented as single beads centered in 

thpiy G-rv positions Adja.ppnt. hpa.ds a.rp strung togpthpr 



temperature, i.e. using Bercnds cn algorithm for coupling 
the system to an external bath (Berendsen, Postma, van 



intola, polvmer chain hv mean of hond and anple intcrac- Gunsteren, DiNola fc Haakj |1984|) . Both temperature and 



tions, while the geometry of the native state is encoded 
in the dihedral angle potential and a non-local potential. 
The energy of a configuration T of a protein having the 
configuration Tq as its native state is thus given by the 
expression: 



angles 



K„ 



<-e y + 

E dlhedral K^[l + cos (n xjj- 0o))] + 
EKi-aMi.iMS fe) 12 - 6 + ea(t, j) (ai)"}. 

(Al) 

In the previous expression r and ro represent the dis- 
tances between two subsequent residues at, respectively, 
the configuration T and the native state Tq. Analogously, 
9 (9o) and </> (0o) represent the angles formed by three 
subsequent residues and the dihedral angle defined by 
four subsequent residues along the chain at the configu- 
ration r (ro). The dihedral potential consists of a sum 
of two terms for every four adjacent C Q atoms, one with 
perio d n — 1 and one with n = 3. The last term in 
Eq. (Al) contains the non-local native interactions and 
a short range repulsive term for non-native pairs (i.e. 

— constant > and C2(i,j) = if i—j is a na- 
tive pair while e(i, j) = and €2(1, j) = constant > 
if i—j is a non-native pair). The parameter <xy is taken 
equal to i—j distance at the native state for native inter- 
actions, while o~ij = 4 A for non-native (i.e. repulsive) 
interactions. Parameters K r , Kg, K^, e weight the rela- 
tive strength of each kind of interaction entering in the 
energy and they are taken to be K r — lOOe, Kg = 20e, 
KT' = e and K^ = 0.5e. With this choice of the param- 
eters we found that the stabilizing energy residing in the 
tertiary contacts is approximately twice the stabilizing 
energy residing in the torsional degrees of freedom. This 
balance among the energy terms is optimal to study the 
folding of our Go-like protein models. The native contact 
map of a protein is derived with the CS U software based 
upo n the approach dev eloped in ref. (Sobolev, Wade 



Vric ad fc Edclman| 1996). Native contacts between pairs 



of residues (i,j) with j < i + 3 are discarded from the 



energy are measured in units of the folding temperature 
Tf in the simulations. 

For each protein model, several constant temperature 
simulations were made and combined using the WHAM 
algorithm ( Fcrrcnberg fc Swcndscn 1988 , Ferrenberg fc 



Swendsen 1989, Swcndscn 1993) to generate a specific 



heat profile versus temperature and a free energy F(Q) 
as a function of the folding reaction coordinate Q. This 
algorithm is based on the fact that the logarithm of prob- 
ability distribution P(Q) of the values taken by a certain 
variable Q (e.g. the order parameter) at fixed tempera- 
ture T may serve as an estimate for the the free energy 
profile F(Q) at that temperature. In fact, the proba- 
bility to have a certain value Q\ for the variable Q, at 
temperature T = 1//?, in the canonical ensemble is given 
by: 



Pp(Qi) = 



W(Q!)e-P E ^ 



(A2) 



where W(Q) is the density of configurations at a point Q 
in the configurational space, Zp is the canonical partition 
function at temperature T — 1/(3 and E(Q) is the energy 
of the system at the value Q of the reaction coordinate}]. 
Since the free energy F is 



F(Q) = E(Q)-TS(Q) 



(A3) 



and the entropy S(Q) is related to the configurational 
density W{Q) 



W(Q) 



oS{Q)/k 



(A4) 



where k is the Boltzmann constant, it follows that 



Pp(Qi) _ e 
Pp{Q2) e 



-0F{Q!) 
-PF(Q 2 ) 



(A5) 



1 Since our model is almost energetically unfrustrated, the 
energy fluctuations for a set of configurations with fixed Q are 
strongly reduced such that the energy in a given configuration 
could be considered as a function of Q. 
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and free energy differences can be computed by 



Captions to the figures 



f3(F(Q 1 )-F(Q 2 )) = log 



Pp(Qi) 



(A6) 



By using the procedure of refs. (Ferrenberg & Swend- 
se n||L988| , |Ferrenberg fc Swendsenj |1989| , j5wendsen|[l993D 



data from a finite set of simulations can be used to ob- 
tain complete thermodynamic information over a large 
parameter region. 

Probability distributions are obtained by sampling the 
configurational space during Molecular Dynamics simu- 
lations. 

For the smaller proteins ( CI2 and SH3) we have deter- 
mined the errors on the estimates of the transition tem- 
perature and contact probabilities (or $ values). This has 
been accomplished by computing these quantities from 
several (more than 10) uncorrelated sets of simulations. 
We found that the standard deviation for each single con- 
tact probability is 0.06 for CI2 and 0.05 for SH3, while 
the transition temperature is determined in both cases 
with an uncertainty smaller than 0.5%. These errors are 
obtained using about 200 uncorrelated conformations in 
the transition state ensemble. Since Barnase, RNase H 
and Che Y have twice to three times the number of ter- 
tiary contacts of SH3 and CI2, in order to have appro- 
priate statistics, we have sampled about 500 uncorrelated 
conformations (thermally weighted) for every transition 
state ensemble or intermediate. 



Fig. 1. (a) Free energy F(Q) as a function of the re- 
action coordinate Q around the folding temperature for 
the model of CI2. Free energies are measured in units 
of ksTf, where T/ is the folding temperature. The un- 
folded, folded and transition state regions are shown in 
the light blue shaded areas, (b) A typical sample simu- 
lation at a temperature around the folding temperature. 
The reaction coordinate Q as a function of time (mea- 
sured in arbitrary unit of molecular dynamics steps) is 
shown. The two-state behaviour is apparent from the 
data. The unfolded and folded states are equally pop- 
ulated at the folding temperature, (c) Heat capacity as 
a function of the temperature (units of folding tempera- 
ture) . 

Fig. 2. The results for the transition state structure 
from the simulations for CI2. The probability of native 
contact formation at the transition state (left panel), and 
bond <I>-values (right panel) are shown. Different colors 
indicate different values from to 1, as quantified by the 
color scale. The a-helix, the interactions between the 
strands 4 and 5, and the minicore (i.e. interactions be- 
tween residues 32,38 and 50) are the parts of the structure 
formed with the highest probability, although they are 
not fully formed. Overall, the transition state ensemble 
appears as an expanded version of the native state where 
most contacts have a similar probability of participation, 
but some interactions are less like to occur. These results 
agree with the transition state structure experimentally 
obtained. 

Fig. 3. (a) Free energy F(Q) as a function of the re- 
action coordinate Q for a set of temperatures around the 
folding temperature. Free energies are measured in units 
of ksTf. The choices for the unfolded, folded and tran- 
sition state regions are marked as shaded regions, (b) 
The reaction coordinate Q as a function of time (unit of 
molecular dynamics steps), from a typical sample sim- 
ulation around the folding temperature. As in Fig. [l], 
the two-state behaviour is apparent. At the transition 
temperature the model protein has equal probability to 
be found in the unfolded or in the folded state, (c) Heat 
capacity as a function of the temperature, in units of 
folding temperature. 

Fig. 4. The transition state structure as obtained 
from the simulations for SH3. Panel in the left repre- 
sents the probability for a native contact to be formed at 
the transition state, while the panel in the right shows the 
results for bond ^-values. Different colors indicate dif- 
ferent values from to 1, as quantified by the color scale. 
Diverging turn and distal loop are marked on the con- 
tact map. The interactions within and between these two 
parts of the protein chain appear to be formed with high 
probability. The interactions between the two strands 
joined by the distal loop are partially formed, while the 
contacts involving the first 20 residues do not contribute 
to the transition state structure. This description of the 
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transition state is in agreement with experimental results. 

Fig. 5. (a) Free energy F(Q) of Barnase protein as 
a function of the reaction coordinate Q around the fold- 
ing temperature. Free energies are measured in units 
of ksTf. The unfolded, folded and intermediate state 
regions are marked in green, while the top of the two 
barriers are marked in light blue. The local minimum in 
the free energy profile between the unfolded and folded 
minima locates the folding intermediate state. The pres- 
ence of a folding intermediate state is also evident from 
panel (b), where the order parameter Q is plotted as a 
function of time for a typical molecular dynamics sim- 
ulation around the folding temperature. In the interval 
Q G (0.4 — 0.5), the same state (i.e. with the same av- 
erage structure) is visited both from the unfolded and 
folded structures. 

Fig. 6. The probability of native contact forma- 
tion for the intermediate (left panel) and transition state 
(right panel) structures as obtained from our simulations 
of Barnase. Different colors indicate different values from 
to 1, as quantified by the color scale. The earliest 
formed part of the protein appears to be the /3-sheet re- 
gion, in agreement with experimental results. The core 3 
(formed by loops 3 and 5 to the /3-sheet) is formed at the 
intermediate and transition state, while core 1 (the pack- 
ing of the helix 1 against the /3-sheet) and the core 2 (the 
interactions between the hydrophobic residues from the 
helices 2 and 3, the strand 1, and the first two loops) start 
to form only after the transition state. The formation of 
the a-helix 1 occurs as a late event of the folding from 
our simulations, while from experimental results it seems 
to be already formed at the intermediate and transition 
state. The early formation of the a-hclix is most prob- 
ably due to energetic factors rather then from topology 
requirements (and then beyond the prediction possibility 
of this model), as detailed in the text. 

Fig. 7. (a) Free energy F(Q) of the model of RNase 
H as a function of the reaction coordinate Q around the 
fold ing temperature. Free energies are measured in units 



of k sTf. The regions corresponding to the unfolded, 
folded and intermediate state are marked in green, while 
the top of the two barriers are marked in light blue. A 
folding intermediate is detected as a local minimum in 
the free energy between the unfolded and folded minima. 
In panel (b) the fraction of native contacts formed, Q, 
is plotted versus the simulation time for a sample of our 
simulations (at a temperature T = 0.99T/) where the 
transition from unfolded to folded state is observed. The 
local minimum of panel (a) corresponds to a transiently 
populated intermediate (located at Q around 0.4) that 
later evolves to the fully folded state. 

Fig. 8. The probability of native contact formation at 
the intermediate (left panel) and transition state (right 
panel) structure, as observed for the RNase H model. 
Different colors indicate different values from to 1, as 
quantified by the color scale. In agreement with experi- 
mental results, we found that interactions involving the 
a-helix 1 are the first formed in the folding process. Con- 



tacts between the a-helix 1 and the strand 4 are highly 
probably formed at the intermediate. Also the a-helix 4 
is well structured and the /3-sheet is partly formed. These 
interactions strengthen at the transition state where also 
the /3-sheet is almost completely formed, while the pack- 
ing of helix 5 across the sheet is not yet accomplished. 

Fig. 9. (a) Free energy F(Q) profile for the model 
of Che Y plotted as a function of the reaction coordinate 
Q for a set of temperatures around the folding tempera- 
ture. Free energies are measured in units of fc^T/. Differ- 
ently from the corresponding figures of Barnase (Fig. |B|) 
and RNase H (Fig. Q), two different structures are popu- 
lated between the folded and unfolded states. In addition 
to the "on-route" intermediate state (marked in green 
as the regions corresponding to the folded and unfolded 
states), a "misfolded" intermediate structure (marked in 
brown at Q around 0.4) is transiently visited from the 
unfolded state. The top of the two barriers are marked 
in light blue. In agreement with experimental results, 
we found that in this "misfolded" structure, all the five 
a-helices are rather structured while, in the later occur- 
ring "on-route" intermediate and transition state ensem- 
ble, the helices 4-5 are completely unstructured (see fig. 
|lCi| ). Panel (b) shows a typical sample of the simula- 
tion around the folding temperature, in a region where 
the folding occurs. The first transiently populated inter- 
mediate state corresponds to a structure where all the 
helices are formed. Before to proceed to the folded state, 
a partial unfolding occurs. 

Fig. 10. The probability of the native CheY con- 
tacts to be formed in the "misfolded" intermediate (left 
panel) and transition state (right panel) for the model 
protein. Different colors indicate different values from 
to 1, as quantified by the color scale. In agreement with 
experimental data, all the helices are mostly formed in 
the transiently populated "misfolded" structure, while 
helices 4 and 5 are rather unstructured at the transi- 
tion state. The two subdomain s experimentally detected 



in the Ch e Y transition state ( Lopez-Hernandez fc Ser- 



1996, Lopez-Hernandez et al. 1997) are evident in 



the figure: the first part of the protein (all interactions 
arising from the a-helices 1-2 and the /3-strands 1-3) 
is folded, while the second part (interactions among the 
a-helices 4-5 and the /3-strands 4-5) is completely un- 
folded. The helix 3 is structured but the interactions 
between the helix 3 and the rest of the protein are not 
completely formed. 



12 



Aim, E. & Baker, D. (1999). Prediction of protein-folding 
mechanisms from free-energy landscapes derived from native 
structures, Proc. Natl. Acad. Set. USA 96: 11305-11310. 

Berendsen, H. J. C, Postma, J. P. M., van Gunsteren, W. F., 
DiNola, A. & Haak, J. R. (1984). Molecular dynamics with 
coupling to an external bath, J. Chem. Phys. 81(8): 3684- 
3690. 

Betancourt, M. R. & Onuchic, J. N. (1995). Kinetics of pro- 
teinlike models: The energy landscape factors that determine 
folding, J. Chem. Phys. 103: 773-787. 

Boczko, E. M. & Brooks III, C. L. (1995). First-principles 
calculation of the folding free energy of a three-helix bundle 
protein, Science 269: 393-396. 

Bryngelson, J. D., Onuchic, J. N., Socci, N. D. & Wolynes, 
P. G. (1995). Funnels, pathways and the energy landscape 
of protein folding, Proteins: Struct. Fund. Genet. 21: 167- 
195. 

Bryngelson, J. D. & Wolynes, P. G. (1987). Spin glasses and 
the statistical mechanics of protein folding, Proc. Natl Acad. 
Set. USA 84: 7524-7528. 

Bryngelson, J. D. & Wolynes, P. G. (1989). Intermediates and 
barrier crossing in a random energy model (with applications 
to protein folding), J. Phys. Chem. 93: 6902-6915. 

Burton, R. E., Huang, G. S., Daugherty, M. A., Calderone, 
T. L. & Oas, T. G. (1997). The energy landscape of a 
fast-folding protein mapped by ala-gly substitutions, Nature 
Struct. Biol. 4: 305-310. 

Chamberlain, A. K., Handel, T. M. & Marqusee, S. (1996). 

Detection of rare partially folded molecules in equilibrium 

with the native conformation of RNase H, Nature Struct. 

Biol. 3: 782-787. 
Chan, H. S. (1998). Matching speed and locality, Nature 

392: 761-763. 

Chiti, F., Taddei, N., White, P. M., Bucciantini, M., 
Magherini, F., Stefani, M. & Dobson, C. M. (1999). Muta- 
tional analysis of acylphosphatase suggests the importance of 
topology and contact order in protein folding, Nature Struct 
Biol 6: 1005-1009. 

Clementi, C, Jennings, P. A. & Onuchic, J. N. (1999). How 
native state topology affects the folding of dihydrofolate re- 
ductase and and interleukin-1/3, Proc. Natl. Acad. Sci. USA 
. submitted. 

Dabora, J. M. & Marqusee, S. (1994). Equilibrium unfold- 
ing of escherichia coli ribonuclease h: characterization of a 
partially folded state, Protein Science 3: 1401-1408. 

Dabora, J. M., Pelton, J. G. & Marqusee, S. (1996). Structure 
of the acid state of escherichia coli ribonuclease hi, Biochem- 
istry 35: 11951-11958. 

Debe, D. A., Carlson, M. J. & Goddard, W. A. (1999). 
The topomer-sampling model of protein folding, Proc. Natl. 
Acad. Sci. USA 96: 2596-2601. 

Dill, K. A. & Chan, H. S. (1997). From levinthal to pathways 
to funnels, Nature Struct. Biol. 4: 10-19. 

Ferrenberg, A. M. & Swendsen, R. H. (1988). New Monte 
Carlo technique for studying phase transitions, Phys. Rev. 



Lett. 61: 2635-2638. 

Ferrenberg, A. M. & Swendsen, R. H. (1989). Optimaized 
monte carlo data analysis, Phys. Rev. Lett. 63: 1195-1198. 

Fersht, A. R. (1994). Characterizing transition states in pro- 
tein folding: an essential step in the puzzle, Curr. Opinion 
Struct. Biol. 5: 79-84. 

Fersht, A. R., Matouschek, A. & Serrano, L. (1992). The fold- 
ing of an enzyme i. theory of protein engineering analysis 
of stability and pathway of protein folding, J. Mol. Biol. 
224: 771-782. 

Galzitskaya, O. V. & Finkelstein, A. V. (1999). A theoreti- 
cal search for folding/unfolding nuclei in three-dimensional 
protein structures, Proc. Natl. Acad. Sci. USA 96: 11299- 
11304. 

Goldstein, R., Luthey-Schulten, Z. A. & Wolynes, P. G. 
(1992). Protein tertiary structure recognition using opti- 
mized hamiltonians with local interactions, Proc. Natl Acad. 
Set. USA 89: 9029-9033. 

Grantcharova, V., Riddle, D., Santiago, J. & Baker, D. (1998). 
Important role of hydrogen bonds in the structurally polar- 
ized transition state for folding of the sre SH3 domain, Na- 
ture Struct Biol 5: 714-720. 

Itzhaki, L. S., Otzen, D. E. & Fersht, A. R. (1995). The struc- 
ture of the transition state for folding of chymotrypsin in- 
hibitor 2 analysed by protein engineering methods: evidence 
for a nucleation-condensation mechanism for protein folding, 
J. Mol. Biol. 254: 260-288. 

Jackson, S. E., elMasry, N. & Fersht, A. R. (1993). Structure 
of the hydrophobic core in the transition state for folding of 
chymotrypsin inhibitor 2: a critical test of the protein engi- 
neering method of analysis, Biochemistry 32: 11270-11278. 

Jackson, S. E. & Fersht, A. R. (1991a). Folding of chy- 
motrypsin inhbitor 2. 2. Influence of proline isomerization 
on the folding kinetics and thermodynamic characterization 
of the transition state of folding, Biochemistry 30: 10436- 
10443. 

Jackson, S. E. & Fersht, A. R. (1991b). Folding of chy- 
motrypsin inhibitor 2. 1. evidence for a two-state transition, 
Biochemistry 30: 10428-10435. 

Jackson, S. E., Moracci, M., elMasry, N., Johnson, C. & Fer- 
sht, A. R. (1993). The effect of cavity creating mutations in 
the hydrophobic core of chymotrypsin inhibitor 2, Biochem- 
istry 32: 11262-11269. 

Kazmirski, S. L., Li, A. & Daggett, V. (1999). Analysis meth- 
ods for comparison of multiple molecular dynamics trajecto- 
ries: applications to protein unfolding pathways and dena- 
tured ensembles, J. Mol. Biol. 290: 283-304. 

Killick, T. R., Freund, S. M. V. & Fersht, A. R. (1999). Real- 
time NMR studies on a transient folding intermediate of 
barstar, Protein Science 8: 1286-1291. 

Kim, D., Gu, H. & Baker, D. (1998). The sequences of small 
proteins are not extensively optimized for rapid folding by 
natural selection, Proc. Natl. Acad. Sci. USA 95: 4982-4986. 

Kim, D., Yi, Q., Gladwin, S., Goldberg, J. & Baker, D. 
(1998). The single helix in protein 1 is largely disrupted at 
the rate-limiting step in folding, J. Mol. Biol. 284: 807-815. 

Klimov, D. K. & Thirumalai, D. (1996). Factors governing 
the foldability of proteins, Proteins: Struct. Fund. Genet. 
26: 411-441. 

Klimov, D. K. & Thirumalai, D. (1998). Lattice models for 



13 



proteins reveal multiple folding nuclei for nucleation-collapse 
mechanism, J. Mol. Biol. 282: 471-492. 
Lazaridis, T. & Karplus, M. (1997). New view of protein fold- 
ing reconciled with the old through multiple unfolding sim- 
ulations, Science 278: 1928-1931. 

Leopold, P. E., Montal, M. & Onuchic, J. N. (1992). Protein 
folding funnels: Kinetic pathways through compact confor- 
mational space, Proc. Natl Acad. Set. USA 89: 8721-8725. 

Li, A. & Daggett, V. (1996). Identification and characteri- 
zation of the unfolding transition state of chymotrypsin in- 
hibitor 2 by molecular dynamics simulations, J. Mol. Biol. 
257: 412-429. 

Lopez-Hernandez, E., Cronet, P., Serrano, L. & Munoz, V. 
(1997). Folding kinetics of CheY mutants with anhanced nat- 
ice a-helix propensities, J. Mol. Biol. 266: 610-620. 

Lopez-Hernandez, E. & Serrano, L. (1996). Structure of the 
transition state for folding of the 129 aa protein CheY re- 
sembles that of a smaller protein, CI-2, Folding & Design 
1: 43-55. 

Martinez, J. C. & Serrano, L. (1999). The folding transition 
state between sh3 domains is conformationally restricted and 
evolutionarily conserved, Nature Struct Biol 6: 1010-1016. 

Martinez, J., Pisabarro, M. & Serrano, L. (1998). Obligatory 
steps in protein folding and the conformational diversity of 
the transition state, Nature Struct Biol 5: 721-729. 

Mateu, M. G., Del Pino, M. S. & Fersht, A. R. (1999). Mech- 
anism of folding and assembly of a small tetrameric pro- 
tein domain from tumor suppressor p53, Nature Struct Biol 
6: 191-198. 

Matouschek, A., Otzen, D. K., Itzhaki, L. S., Jackson, S. E. 
& Fersht, A. R. (1995). Movement of the position of the 
transition state in protein folding, Biochemistry 34: 13656- 
13662. 

Micheletti, C, Banavar, J., Maritan, A. & Seno, F. (1999). 
Protein structures and optimal folding emerging from a ge- 
ometrical variational principle, Phys. Rev. Lett. 82: 3372- 
3375. 

Mirny, L. A., Abkevich, V. & Shakhnovich, E. I. (1996). Uni- 
versality and diversity of the protein folding scenarios: A 
comprehensive analysis with the aid of a lattice model, Fold- 
ing & Design 1: 103-116. 

Munoz, V. & Eaton, W. A. (1999). A simple model for calcu- 
lating the kinetics of protein folding from three-dimensional 
structures, Proc. Natl. Acad. Sci. USA 96: 11311-11316. 

Nelson, E. D., Eyck, L. T. & Onuchic, J. N. (1997). Symmetry 
and kinetic optimization of proteinlike heteropolymers, Phys. 
Rev. Lett. 79: 3534-3537. 

Nelson, E. D. & Onuchic, J. N. (1998). Proposed mechanism 
for stability of proteins to evolutionary mutations, Proc. Natl 
Acad. Sci. USA 95: 10682-10686. 

Nymeyer, H., Garcia, A. E. & Onuchic, J. N. (1998). Folding 
funnels and frustration in off-lattice minimalist models, Proc. 
Natl Acad. Sci. USA 95: 5921-5928. 

Nymeyer, H., Socci, N. D. & Onuchic, J. N. (2000). Landscape 
approaches for determining the ensemble of folding transition 
states: Success and failure hinge on the degree of frustration, 
Proc. Natl. Acad. Sci. USA . in press. 

Onuchic, J. N., Luthey-Schulten, Z. & Wolynes, P. G. (1997). 
Theory of protein folding: the energy landscape perspective, 
Annu. Rev. Phys. Chem. 48: 545-600. 



Onuchic, J. N., Nymeyer, H., Garcia, A. E., Chahine, J. & 
Socci, N. D. (1999). The energy landscape theory of protein 
folding: Insights into folding mechanisms and scenarios, Adv. 
Protein Chem. . in press. 

Onuchic, J. N., Socci, N. D., Luthey-Schulten, Z. A. & 
Wolynes, P. G. (1996). Protein folding funnels: The nature of 
the transition state ensemble, Folding & Design 1: 441-450. 

Onuchic, J. N., Wolynes, P. G., Luthey-Schulten, Z. A. & 
Socci, N. D. (1995). Towards an outline of the topography of 
a realistic protein folding funnel, Proc. Natl Acad. Sci. USA 
92: 3626-3630. 

Otzen, D. & Fersht, A. (1995). Side-chain determinants of 
/3-sheet stability, Biochemistry 34: 5718-5724. 

Pande, V. & Rokhsar, D. (1999). Folding pathway of a lattice 
model for proteins, Proc. Natl. Acad. Sci. USA 96: 1273- 
1278. 

Pearlman, D. A., Case, D. A., Caldwell, J. W., Ross, W. S., 
Cheatam, T. E., Ferguson, D. M., Singh, U. C, Weiner, P. 
& Kollman, P. A. (1995). AMBER, V. 4.1. 

Plaxco, K. W., Simons, K. T. & Baker, D. (1998). Contact 
order, transition state placement and the refolding rates of 
single domain proteins, J. Mol. Biol. 277: 985-994. 

Plotkin, S. S. & Onuchic, J. N. (1999). Investigation of routes 
and funnels in protein folding by free energy functional meth- 
ods, Proc. Natl. Acad. Sci. USA . submitted. 

Raschke, T. M., Kho, J. & Marqusee, S. (1999). Confirmation 
of the hierarchical folding of RNase H: a protein engineering 
study, Nature Struct. Biol. 6: 825-831. 

Raschke, T. M. & Marqusee, S. (1997). The kinetic folding in- 
termediate of ribonuclease h resembles the acid molten glob- 
ule and partially unfolded molecules detected under native 
conditions, Nature Struct. Biol. 4: 298-304. 

Riddle, D. S., Grantcharova, V. P., Santiago, J. V., Aim, E., 
Ruczinski, I. & Baker, D. (1999). Experiment and theory 
highlight role of native state topology in sh3 folding, Nature 
Struct Biol 6: 1016-1024. 

Scalley, M. L. & Baker, D. (1997). Protein folding kinetics ex- 
hibit an arrhenius temperature dependence when corrected 
for the temperature dependence of protein stability, Proc. 
Natl Acad. Sci. USA 44: 10636-10640. 

Scheraga, H. A. (1992). Contribution of physical chemistry to 
an understanding of protein structure and function, Protein 
Science 1: 691. 

Shea, J. E., Nochomovitz, Y. D., Guo, Z. Y. & Brooks III, 
C. L. (1998). Exploring the space of protein folding hamil- 
tonians: The balance of forces in a minimalist beta-barrel 
model, J. Chem. Phys. 109: 2895-2903. 

Shea, J. E., Onuchic, J. N. & Brooks III, C. L. (1999). Explor- 
ing the origins of topological frustration: design of a mini- 
mally frustrated model of fragment b of protein a, Proc. Natl. 
Acad. Sci. USA 96: 12512-12517. 

Sheinerman, F. B. & Brooks III, C. L. (1998a). Calculations 
on folding of segment bl of streptococcal protein g, J. Mol. 
Biol. 278: 439-455. 

Sheinerman, F. B. & Brooks III, C. L. (1998b). Molecular 
picture of folding of a small alpha/beta protein, Proc. Natl. 
Acad. Sci. USA 95: 1562-1567. 

Shoemaker, B. A., Wang, J. & Wolynes, P. G. (1999). Ex- 
ploring structures in protein folding funnels with free energy 
functionals: the transition state ensemble, 287: 675-694. J. 



14 



Mol. Biol. 

Shoemaker, B. A. & Wolynes, P. G. (1999). Exploring struc- 
tures in protein folding funnels with free energy functionals: 
the denatured ensemble, 287: 657-674. J. Mol. Biol. 

Sobolev, V., Wade, R., Vriend, G. & Edelman, M. (1996). 
Molecular docking using surface complementarity, Proteins 
25: 120-129. 

Socci, N. D., Nymeyer, H. & Onuchic, J. N. (1997). Exploring 
the protein folding landscape, Physica D 107: 366-382. 

Socci, N. D. & Onuchic, J. N. (1995). Kinetic and thermody- 
namic analysis of proteinlike heteropolymers: Monte carlo 
histogram technique., J. Chem. Phys. 103: 4732-4744. 

Socci, N. D., Onuchic, J. N. & Wolynes, P. G. (1996). Diffu- 
sive dynamics of the reaction coordinate for protein folding 
funnels, J. Chem. Phys. 104: 5860-5868. 

Swendsen, R. H. (1993). Modern methods of analyzing monte 
carlo computer simulations, Physica A 194: 53-62. 

Ueda, Y., Taketomi, H. & Go, N. (1975). Studies on protein 
folding, unfolding and fluctuations by computer simulation. 
I. The effects of specific amino acid sequence represented by 
specific inter-unit interactions, Int. J. Peptide Res. 7: 445- 
459. 

Villegas, V., Martinez, J., Aviles, F. & Serrano, L. (1998). 
Structure of the transition state in the folding process of 
human procarboxypeptidase A2 activation domain, J. Mol. 
Biol. 283: 1027-1036. 

Wolynes, P. G. (1996). Symmetry and the energy landscapes 
of biomolecules, Proc. Natl. Acad. USA 93: 14249-14255. 

Wolynes, P. G., Schulten, Z. L. & Onuchic, J. N. (1996). Fast- 
folding experiments and the topography of protein folding 
energy landscapes, Chemistry & Biology 3: 425-432. 

Yamasaki, K., Ogasahara, K., Yutani, K., Oobatake, M. &: 
Kanaya, S. (1995). Folding pathway of escherichia coli Ri- 
bonuclease HI: a circular dichroism, fluorescence, and NMR 
study, Biochemistry 34: 16552-16562. 



15 




0.8 1 1.2 

T (units of T f ) 

FIG. 1. 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 




strand 




helix 





FIG. 2. 




0.2 0.4 0.6 0.8 1 
Q 




0.8 1 1.2 

T (units of T f ) 

FIG. 3. 




FIG. 4. 




Q 

FIG. 5. 




FIG. 6. 




Q 

FIG. 7. 




FIG. 8. 




Q 

FIG. 9. 




FIG. 10. 



